123 research outputs found

    Linear Subspace Learning for Facial Expression Analysis

    Get PDF

    Inferring Facial and Body Language

    Get PDF
    Machine analysis of human facial and body language is a challenging topic in computer vision, impacting on important applications such as human-computer interaction and visual surveillance. In this thesis, we present research building towards computational frameworks capable of automatically understanding facial expression and behavioural body language. The thesis work commences with a thorough examination in issues surrounding facial representation based on Local Binary Patterns (LBP). Extensive experiments with different machine learning techniques demonstrate that LBP features are efficient and effective for person-independent facial expression recognition, even in low-resolution settings. We then present and evaluate a conditional mutual information based algorithm to efficiently learn the most discriminative LBP features, and show the best recognition performance is obtained by using SVM classifiers with the selected LBP features. However, the recognition is performed on static images without exploiting temporal behaviors of facial expression. Subsequently we present a method to capture and represent temporal dynamics of facial expression by discovering the underlying low-dimensional manifold. Locality Preserving Projections (LPP) is exploited to learn the expression manifold in the LBP based appearance feature space. By deriving a universal discriminant expression subspace using a supervised LPP, we can effectively align manifolds of different subjects on a generalised expression manifold. Different linear subspace methods are comprehensively evaluated in expression subspace learning. We formulate and evaluate a Bayesian framework for dynamic facial expression recognition employing the derived manifold representation. However, the manifold representation only addresses temporal correlations of the whole face image, does not consider spatial-temporal correlations among different facial regions. We then employ Canonical Correlation Analysis (CCA) to capture correlations among face parts. To overcome the inherent limitations of classical CCA for image data, we introduce and formalise a novel Matrix-based CCA (MCCA), which can better measure correlations in 2D image data. We show this technique can provide superior performance in regression and recognition tasks, whilst requiring significantly fewer canonical factors. All the above work focuses on facial expressions. However, the face is usually perceived not as an isolated object but as an integrated part of the whole body, and the visual channel combining facial and bodily expressions is most informative. Finally we investigate two understudied problems in body language analysis, gait-based gender discrimination and affective body gesture recognition. To effectively combine face and body cues, CCA is adopted to establish the relationship between the two modalities, and derive a semantic joint feature space for the feature-level fusion. Experiments on large data sets demonstrate that our multimodal systems achieve the superior performance in gender discrimination and affective state analysis.Research studentship of Queen Mary, the International Travel Grant of the Royal Academy of Engineering, and the Royal Society International Joint Project

    RGB-T salient object detection via fusing multi-level CNN features

    Get PDF
    RGB-induced salient object detection has recently witnessed substantial progress, which is attributed to the superior feature learning capability of deep convolutional neural networks (CNNs). However, such detections suffer from challenging scenarios characterized by cluttered backgrounds, low-light conditions and variations in illumination. Instead of improving RGB based saliency detection, this paper takes advantage of the complementary benefits of RGB and thermal infrared images. Specifically, we propose a novel end-to-end network for multi-modal salient object detection, which turns the challenge of RGB-T saliency detection to a CNN feature fusion problem. To this end, a backbone network (e.g., VGG-16) is first adopted to extract the coarse features from each RGB or thermal infrared image individually, and then several adjacent-depth feature combination (ADFC) modules are designed to extract multi-level refined features for each single-modal input image, considering that features captured at different depths differ in semantic information and visual details. Subsequently, a multi-branch group fusion (MGF) module is employed to capture the cross-modal features by fusing those features from ADFC modules for a RGB-T image pair at each level. Finally, a joint attention guided bi-directional message passing (JABMP) module undertakes the task of saliency prediction via integrating the multi-level fused features from MGF modules. Experimental results on several public RGB-T salient object detection datasets demonstrate the superiorities of our proposed algorithm over the state-of-the-art approaches, especially under challenging conditions, such as poor illumination, complex background and low contrast

    Learning gender from human gaits and faces

    Get PDF
    Computer vision based gender classification is an important component in visual surveillance systems. In this paper, we investigate gender classification from human gaits in image sequences, a relatively understudied problem. Moreover, we propose to fuse gait and face for improved gender discrimination. We exploit Canonical Correlation Analysis (CCA), a powerful tool that is well suited for relating two sets of measurements, to fuse the two modalities at the feature level. Experiments demonstrate that our multimodal gender recognition system achieves the superior recognition performance of 97.2 % in large datasets. In this paper, we investigate gender classification from human gaits in image sequences using machine learning methods. Considering each modality, face or gait, in isolation has its inherent weakness and limitations, we further propose to fuse gait and face for improved gender discrimination. We exploit Canonical Correlation Analysis (CCA), a powerful tool that is well suited for relating two sets of signals, to fuse the two modalities at the feature level. Experiments on large dataset demonstrate that our multimodal gender recognition system achieves the superior recognition performance of 97.2%. We plot in Figure 1 the flow chart of our multimodal gender recognition system. 1

    Device for obtaining respiratory information of a subject

    Get PDF
    A device and method for reliably and accurately obtaining respiratory information of a subject despite motion of the subject are disclosed. The proposed device comprises a motion signal computing unit for computing a number M of motion signals for a plurality of pixels and/or groups of pixels of at least a region of interest for a number N of image frames of a subject, a transformation unit for computing, for some or all M motion signals, a number of source signals representing independent motions within said images by applying a transformation to the respective motion signals to obtain source signals representing independent motions within said N image frames, and a selection unit for selecting a source signal from among said computed source signals representing respiration of said subject by examining one or more properties of said source signals for some or all of said computed source signals
    • …
    corecore